![]() |
|
| TransactionID | CustomerID | CustomerDOB | CustGender | CustLocation | CustAccountBalance | TransactionDate | TransactionAmount (INR) | CustomerAge | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | T1 | C5841053 | 1994-01-10 | 0 | JAMSHEDPUR | 17819.05 | 2016-08-02 | 25.0 | 22 |
| 1 | T2 | C2142763 | 1957-04-04 | 1 | JHAJJAR | 2270.69 | 2016-08-02 | 27999.0 | 59 |
| 2 | T3 | C4417068 | 1996-11-26 | 0 | MUMBAI | 17874.44 | 2016-08-02 | 459.0 | 20 |
| 3 | T4 | C5342380 | 1973-09-14 | 0 | MUMBAI | 866503.21 | 2016-08-02 | 2060.0 | 43 |
| 4 | T5 | C9031234 | 1988-03-24 | 0 | NAVI MUMBAI | 6714.43 | 2016-08-02 | 1762.5 | 28 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1048562 | T1048563 | C8020229 | 1990-04-08 | 1 | NEW DELHI | 7635.19 | 2016-09-18 | 799.0 | 26 |
| 1048563 | T1048564 | C6459278 | 1992-02-20 | 1 | NASHIK | 27311.42 | 2016-09-18 | 460.0 | 24 |
| 1048564 | T1048565 | C6412354 | 1989-05-18 | 1 | HYDERABAD | 221757.06 | 2016-09-18 | 770.0 | 27 |
| 1048565 | T1048566 | C6420483 | 1978-08-30 | 1 | VISAKHAPATNAM | 10117.87 | 2016-09-18 | 1000.0 | 38 |
| 1048566 | T1048567 | C8337524 | 1984-03-05 | 1 | PUNE | 75734.42 | 2016-09-18 | 1166.0 | 32 |
984614 rows × 9 columns
The dataset consists of over 1 million transactions by over 800k customers from a bank in India. It covers nearly three months in 2016, from August 1st to October 21st.
1. Age Distribution
The majority of the customers fall between the ages of 20 to 40, with the peak being at 26.
2. Gender Distribution
The number of male customers is almost three times higher than that of female customers, indicating a dominance of male customers.
3. The Most Frequent 20 Locations
Compared to Mumbai, the sum of transactions in all other regions, excluding the top 5, is significantly lower.

| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Frequency | 838561.0 | 1.174171 | 0.434989 | 1.00 | 1.0 | 1.0 | 1.0 | 6.00 |
| Monetary | 838561.0 | 1706.621763 | 6689.594162 | 0.01 | 199.0 | 500.0 | 1420.0 | 1560034.99 |
| Recency | 838561.0 | 55.407019 | 15.219939 | 0.00 | 43.0 | 55.0 | 68.0 | 81.00 |
Monetary VS. Recency
For the majority of customers, the monetary values remain low. Most of the bank customers were low-income customers creating bank accounts for depositing money.
Frequency
Text(0, 0.5, 'Counts')
During the three months, it appears that nearly all of bank customers did not engage in regular transactions with their bank for more than once.
Recency
From the recency boxplot, we can have the same conclusion as what we had from frequency.

| Frequency | CustGender | CustAccountBalance | Monetary | CustomerAge | Recency | |
|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 120180.54 | 5106.0 | 24 | 25 |
| 1 | 1 | 1 | 24204.49 | 1499.0 | 22 | 68 |
| 2 | 2 | 0 | 161848.76 | 1455.0 | 24 | 75 |
| 3 | 1 | 0 | 496.18 | 30.0 | 26 | 36 |
| 4 | 1 | 1 | 87058.65 | 5000.0 | 51 | 64 |
| ... | ... | ... | ... | ... | ... | ... |
| 838556 | 1 | 1 | 133067.23 | 691.0 | 26 | 75 |
| 838557 | 1 | 1 | 96063.46 | 222.0 | 20 | 36 |
| 838558 | 1 | 1 | 5559.75 | 126.0 | 23 | 64 |
| 838559 | 1 | 1 | 35295.92 | 50.0 | 21 | 54 |
| 838560 | 1 | 1 | 6968.93 | 855.0 | 34 | 26 |
838561 rows × 6 columns
| Frequency | CustGender | CustAccountBalance | Monetary | CustomerAge | Recency | |
|---|---|---|---|---|---|---|
| 0 | 1.898505 | -1.615003 | 0.016841 | 0.508159 | -0.803137 | -1.997842 |
| 1 | -0.400403 | 0.619194 | -0.098354 | -0.031037 | -1.031624 | 0.827401 |
| 2 | 1.898505 | -1.615003 | 0.066852 | -0.037614 | -0.803137 | 1.287324 |
| 3 | -0.400403 | -1.615003 | -0.126810 | -0.250631 | -0.574651 | -1.275106 |
| 4 | -0.400403 | 0.619194 | -0.022914 | 0.492314 | 2.281429 | 0.564587 |
| ... | ... | ... | ... | ... | ... | ... |
| 838556 | -0.400403 | 0.619194 | 0.032308 | -0.151821 | -0.574651 | 1.287324 |
| 838557 | -0.400403 | 0.619194 | -0.012106 | -0.221930 | -1.260110 | -1.275106 |
| 838558 | -0.400403 | 0.619194 | -0.120732 | -0.236281 | -0.917380 | 0.564587 |
| 838559 | -0.400403 | 0.619194 | -0.085042 | -0.247642 | -1.145867 | -0.092446 |
| 838560 | -0.400403 | 0.619194 | -0.119041 | -0.127306 | 0.339295 | -1.932139 |
838561 rows × 6 columns
Elbow Method Plot


Agglomerative Clustering Method
| label | |
|---|---|
| 2 | 235782 |
| 4 | 224744 |
| 1 | 181287 |
| 6 | 122568 |
| 0 | 70932 |
| 5 | 3115 |
| 3 | 133 |
From the table, it can be observed that Cluster 3 and Cluster 5 have a relatively small percentage of customers compared to the other clusters.
|
Clusters 1, 2, and 4: Since these clusters are similar in terms of recency and diversity, the bank could recommend a credit card with rewards or cash back. This would incentivize customers to use the credit card frequently and potentially increase their account balances over time. |
|
|
Cluster 6: Given that this group comprises a higher-aged population and is better off, the bank could recommend a savings account with a higher interest rate. This would appeal to customers who are more financially stable and may be looking for a low-risk investment option. |
|
|
Cluster 0: Since this group tends to have more transactions with the bank, the bank could recommend a checking account with no monthly maintenance fees. This would be an attractive option for customers who frequently use their checking account and want to avoid additional fees. Additionally, the bank could offer overdraft protection to prevent customers from incurring fees for overdrawing their account. |
|